智能论文笔记

Improving Training and Inference of Face Recognition Models via Random Temperature Scaling

Lei Shang , Mouxiao Huang , Wu Shi , Yuchen Liu , Yang Liu , Fei Wang , Baigui Sun , Xuansong Xie , Yu Qiao

分类：计算机视觉 | 人工智能

2022-12-02

Data uncertainty is commonly observed in the images for face recognition (FR). However, deep learning algorithms often make predictions with high confidence even for uncertain or irrelevant inputs. Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples. Taking a probabilistic view of the current classification model, the temperature scalar is exactly the scale of uncertainty noise implicitly added in the softmax function. Meanwhile, the uncertainty of images in a dataset should follow a prior distribution. Based on the observation, a unified framework for uncertainty modeling and FR, Random Temperature Scaling (RTS), is proposed to learn a reliable FR algorithm. The benefits of RTS are two-fold. (1) In the training phase, it can adjust the learning strength of clean and noisy samples for stability and accuracy. (2) In the test phase, it can provide a score of confidence to detect uncertain, low-quality and even OOD samples, without training on extra labels. Extensive experiments on FR benchmarks demonstrate that the magnitude of variance in RTS, which serves as an OOD detection metric, is closely related to the uncertainty of the input image. RTS can achieve top performance on both the FR and OOD detection tasks. Moreover, the model trained with RTS can perform robustly on datasets with noise. The proposed module is light-weight and only adds negligible computation cost to the model.

translated by 谷歌翻译

FOLIO: Natural Language Reasoning with First-Order Logic

Simeng Han , Hailey Schoelkopf , Yilun Zhao , Zhenting Qi , Martin Riddell , Luke Benson , Lucy Sun , Ekaterina Zubova , Yujie Qiao , Matthew Burtell

分类：自然语言处理

2022-09-02

我们介绍了一项对自然语言（NL）推理的人类通知，开放域和逻辑上复杂且多样的数据集，配备了一阶逻辑（fol）注释。对开本由1,435个示例（独特的结论）组成，每个示例与487组前提之一搭配，这些场所作为规则，可用于演绎理由，以理解每个结论的有效性。前提和结论的逻辑正确性是通过其平行注释来确保的，这些注释会自动由我们的FOL推理引擎验证。除了主要的NL推理任务外，对开本中的NL-FOL对自动构成了使用FOL作为逻辑形式的新的NL-FOL翻译数据集。我们对广泛的实验系统地评估了对中型语言模型（BERT，ROBERTA）进行微调的FOL推理能力，并且在大型语言模型（GPT-NEOX，OPT，OPT，GPT-3，Codex）上促成了很少的射击。对于NL-FOL翻译，我们尝试使用GPT-3和Codex。我们的结果表明，公开可用的最强大的大语言模型之一（LLM），GPT-3 Davinci，仅比随机结果略好，而在一部分集的一部分中，该模型尤其不好，并且在预测该模型方面尤其不好。纠正虚假和未知结论的真实价值。我们的数据集和代码可在https://github.com/yale-lily/folio上找到。

translated by 谷歌翻译

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

Xin Huang , Xiaoyu Tian , Junru Gu , Qiao Sun , Hang Zhao

分类：计算机视觉 | 人工智能 | 机器人

2022-08-09

预测道路代理的未来行为是自动驾驶的关键任务。尽管现有模型在预测边际代理的未来行为方面取得了巨大的成功，但有效预测多种代理的一致的关节行为仍然是一个挑战。最近，提出了占用场的占用场表示，以通过占用网格和流量的结合来代表公路代理的联合未来状态，从而支持有效且一致的关节预测。在这项工作中，我们提出了一个新颖的占用流场预测因子，以产生准确的占用和流动预测，通过结合图像编码器的功能，该图像编码器从栅格化的流量图像中学习特征和矢量编码器，以捕获连续代理轨迹和地图状态的信息。在生成最终预测之前，这两个编码的功能由多个注意模块融合。我们的简单但有效的模型排在Waymo Open数据集占用和流预测挑战中，并在封闭的占用和流动预测任务中取得了最佳性能。

translated by 谷歌翻译

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

Yuanhan Zhang , Qinghong Sun , Yichun Zhou , Zexin He , Zhenfei Yin , Kun Wang , Lu Sheng , Yu Qiao , Jing Shao , Ziwei Liu

分类：计算机视觉

2022-03-15

大规模数据集在计算机视觉中起着至关重要的作用。但是当前的数据集盲目注释而没有与样品区分的区分，从而使数据收集效率低下且不计。开放的问题是如何积极地构建大型数据集。尽管先进的主动学习算法可能是答案，但我们在实验上发现它们在分发数据广泛的现实注释方案中是la脚的。因此，这项工作为现实的数据集注释提供了一个新颖的主动学习框架。配备了此框架，我们构建了一个高质量的视觉数据集 - 竹子，由69m的图像分类注释，带有119K类别，带有809个类别的28m对象边界框注释。我们通过从几个知识库中整合的层次分类法来组织这些类别。分类注释比Imagenet22K大四倍，检测的注释比Object365大三倍。与ImagEnet22K和Objects365相比，预先训练的竹子在各种下游任务中实现了卓越的性能（分类的6.2％增长，检测到2.1％的增长）。我们认为，我们的积极学习框架和竹子对于将来的工作至关重要。

translated by 谷歌翻译

INTERN: A New Learning Paradigm Towards General Vision

Jing Shao , Siyu Chen , Yangguang Li , Kun Wang , Zhenfei Yin , Yinan He , Jianing Teng , Qinghong Sun , Mengya Gao , Jihao Liu

分类：计算机视觉 | 人工智能 | 机器学习

2021-11-16

过去几年的技术创新的巨大浪潮，标志着AI技术的进展，是深刻的重塑行业和社会。然而，在路上，一个关键的挑战等待着我们，即我们满足快速增长的情景的能力的能力受到收购培训数据的成本的严重限制。由于主流学习范式的局限性，这一困难的局面是基于主流学习范式的局限性：我们需要根据大量注释的数据以及通常从头来训练每个新场景的新模型。在解决这一基本问题时，我们超越并开发一个名为实习生的新学习范式。通过在多个阶段的来自多个来源的监控信号学习，培训的模型将产生强大的相互性。我们在26个众所周知的数据集中评估我们的模型，该数据集涵盖计算机视觉中的四类任务。在大多数情况下，我们的模型仅适用于目标域中的培训数据的10％，始终以完整的数据培训的对应物，通常由显着的边距。这是一个重要前景的重要一步，其中具有一般视觉能力的这种模型可以大大降低对数据的依赖，从而加速通过AI技术的采用。此外，围绕我们的新范式旋转，我们还介绍了一个新的数据系统，新的架构和新的基准，以及一起形成一般愿景生态系统，以开放和包容性的方式支持其未来的发展。

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译

Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling

Penghao Wu , Li Chen , Hongyang Li , Xiaosong Jia , Junchi Yan , Yu Qiao

分类：计算机视觉

2023-01-03

Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.

translated by 谷歌翻译

A Multi-Source Information Learning Framework for Airbnb Price Prediction

Lu Jiang , Yuanhan Li , Na Luo , Jianan Wang , Qiao Ning

分类：机器学习

2023-01-01

With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.

translated by 谷歌翻译

Yuille-Poggio's Flow and Global Minimizer of polynomials through convexification by Heat Evolution

Qiao Wang

分类：计算机视觉

2023-01-01

In this paper, we investigate the possibility of the backward-differential-flow-like algorithm which starts from the minimum of convexification version of the polynomial. We apply the heat evolution convexification approach through Gaussian filtering, which is actually an accumulation version of Steklov's regularization. We generalize the fingerprint theory which was proposed in the theory of computer vision by A.L. Yuille and T. Poggio in 1980s, in particular their fingerprint trajectory equation, to characterize the evolution of minimizers across the scale. On the other hand, we propose the "seesaw" polynomials $p(x|s)$ and we find a seesaw differential equation $\frac{\partial p(x|s)}{\,ds}=-\frac{1}{p''(x)}$ to characterize the evolution of global minimizer $x^*(s)$ of $p(x|s)$ while varying $s$. Essentially, both the fingerprints $\mathcal{FP}_2$ and $\mathcal{FP}_3$ of $p(x)$, consisting of the zeros of $\frac{\partial^2 p(x,t)}{\partial x^2}$ and $\frac{\partial^3 p(x,t)}{\partial x^3}$, respectively, are independent of seesaw coefficient $s$, upon which we define the Confinement Zone and Escape Zone. Meanwhile, varying $s$ will monotonically condition the location of global minimizer of $p(x|s)$, and all these location form the Attainable Zone. Based on these concepts, we prove that the global minimizer $x^*$ of $p(x)$ can be inversely evolved from the global minimizer of its convexification polynomial $p(x,t_0)$ if and only if $x^*$ is included in the Escape Zone. In particular, we give detailed analysis for quartic and six degree polynomials.

translated by 谷歌翻译

Generative Graph Neural Networks for Link Prediction

Xingping Xian , Tao Wu , Xiaoke Ma , Shaojie Qiao , Yabin Shao , Chao Wang , Lin Yuan , Yu Wu

分类：人工智能

2022-12-31

Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.

translated by 谷歌翻译